Dynamic Minilmv2 L6 H384 Squad1.1 Int8 Static
MIT
QuaLA-MiniLM is a compact language model developed by Intel, integrating knowledge distillation, length-adaptive transformers, and 8-bit quantization technology. It achieves up to 8.8x acceleration on the SQuAD1.1 dataset with less than 1% accuracy loss.
Large Language Model
Transformers